Back

Journal of NeuroEngineering and Rehabilitation

Springer Science and Business Media LLC

Preprints posted in the last 7 days, ranked by how well they match Journal of NeuroEngineering and Rehabilitation's content profile, based on 28 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Development of an Open-Access Action Observation Video Library for Upper Limb Motor Rehabilitation

Madison, M.; Wheaton, L. A.; Rowe, V.

2026-06-10 rehabilitation medicine and physical therapy 10.64898/2026.06.10.26355108 medRxiv
Top 0.1%
14.8%
Show abstract

Background: Occupational therapists can improve stroke survivors hand and arm movement and participation in daily activities through action observation (AO). AO involves watching another persons hand or arm complete a movement or task. While research generally supports the use of AO with stroke survivors, there are limited AO videos are available to occupational therapists which makes applying AO challenging. Objective: The purpose of this work is to develop structured and widely accessible tool to support access to AO for stroke survivors, occupational therapists, and researchers. Methods: To develop an AO video library for stroke rehabilitation, functional and non-functional upper limb task deficits were first identified through clinical observations and clinician interviews to establish a prioritized list of daily activities. In collaboration with media production specialists, healthy adult volunteers were recruited and filmed performing these tasks from both first- and third-person perspectives. The recorded videos were then systematically edited, enhanced with instructional title slides, and distributed via a public YouTube channel for clinical application and a categorized digital repository for research purposes. Results: Initial assessments revealed a complete lack of familiarity, awareness, and utilization of AO resources among local occupational therapists, despite high perceived clinical utility. To address this gap, a final library of 150 tasks was established, resulting in the production of 419 finalized, standardized videos featuring six healthy volunteers. For clinical application, these videos were hosted on a free, public YouTube channel organized into 18 functional playlists, while a parallel set was structured into distinct movement categories for research repository storage. Conclusion: By providing a structured and highly accessible tool, this repository enables clinicians, researchers, and caregivers to readily implement evidence-based action observation interventions in both clinical and home settings.

2
Registered Report: Artifact Index for Capacitive Electrocardiography Acquired with an Armchair

Warnecke, J. M.; Baumgärtel, D.; Bollmann, J.; Deserno, T. M.

2026-06-09 health informatics 10.64898/2026.06.03.26353526 medRxiv
Top 0.4%
2.7%
Show abstract

Background Continuous health monitoring enables early detection of diseases and improves therapeutic outcomes. Non-intrusive biosignal sensors, such as capacitive ECG (cECG), offer a practical solution for daily monitoring in private environments, such as smart homes and vehicles. However, artifacts reduce signal quality and compromise reliability. Methods Following a registered report protocol (Warnecke JM et al. Plos One. 2021; 16(7):e0254780), we record data of 44 subjects and develop an artifact index for cECG. We use three signal quality indices (SQIs): the correlation of QRS complexes (corSQI), the R-peak detection consistency (bSQI) and the absolute amplitude ratio (aSQI). Our index classifies overlapping 10s segments with a step-width of 2s into clean or artifact segments. We label a 2s interval as artifacts if all five overlapping segments indicate artifacts. We record cECGs using an armchair with integrated electrodes in a single-arm study involving 44 subjects performing two activities -- reading and watching television (TV); for 11 minutes each. We record a time-synchronized reference ECG with skin electrodes on the chest. To evaluate the artifact index, we compare it with manually generated ground truth. Moreover, we evaluate the clothing materials cotton, linen, jeans, and polyester in 5 subjects. Results Watching TV results in longer, continuously clean signal durations than reading. On average, 88.3% of the signal has a minimum continuous clean duration of 10s, versus 79.8% during reading. All clothing configurations achieve a clean signal duration exceeding 10s. Among the SQI metrics, bSQI performs best, achieving an accuracy of 90.7% and an F1 score of 79.9%. Combining the three SQI metrics in a voting approach improves accuracy to 92.0% and F1 score to 82.1%. Discussion Our artifact index automatically distinguishes clean from artifact cECG segments, promoting health monitoring in unsupervised real-world settings, earlier disease detection, and preventive health management. A limitation is the investigation of only two scenarios (reading and watching TV).

3
Sensorimotor recovery and neuropathic pain reduction after remotely delivered cognitive multisensory rehabilitation or remotely delivered exercise in adults with spinal cord injury: a pilot clinical trial.

Van de Winckel, A.; Herrmann, A. A.; Carpentier, S. T.; Bottale, S.; Lopez, R. L.; Rapacz, A. D.; Larson, S. J.; Deng, W.; Zhang, L.; Hendrickson, T. J.; Mueller, B. A.; Nourian, R.; Morse, L. R.; Lim, K. O.

2026-06-09 rehabilitation medicine and physical therapy 10.64898/2026.06.02.26354574 medRxiv
Top 0.5%
1.8%
Show abstract

Introduction: Reduced or lost sensation and movement after a spinal cord injury (SCI) impairs the brain s ability to accurately localize paralyzed body parts, causing deficits in its internal body map, or mental body representations (MBR). These deficits hinder functional recovery and contribute to neuropathic pain. Medications for neuropathic pain are often ineffective and carry side effects. Our pilot trials found that in-person Cognitive Multisensory Rehabilitation (CMR), a physical therapy restoring MBR, led to prolonged pain reduction, improved sensorimotor function, and enhanced brain function, to greater extent than adaptive fitness. To explore more accessible interventions for those in rural areas or with transportation challenges, we examined whether 12 weeks of remotely delivered CMR or exercise would (1) improve function and reduce pain; (2) increase brain activity and connectivity related to sensorimotor function and MBR in adults with SCI. Methods: Of 19 adults with SCI who consented, 15 (51+/-15 years old, 8+/-10 years post-SCI) were randomized to 12 weeks of remotely delivered CMR or exercise (45min, 3x/week). Eight reported neuropathic pain equal or greater than 3/10. The Numeric Pain Rating Scale (NPRS), ASIA Impairment Scale (AIS), and Neuromuscular Recovery Scale (NRS) assessed pain and sensorimotor function at baseline, post-intervention, and 6-month follow-up. Functional MRI included resting-state and four tasks: imagining feeling the left leg, imagining moving the left leg, whole-body movement imagery, and a sensation task. Results: After CMR (n=8), participants improved on AIS (large effect sizes: touch: d=1.30; pinprick: d=1.21; lower limb motor function: d=1.83). Exercise (n=7) produced smaller improvements (touch: d=0.35; pinprick: d=0.36; lower limb motor function: d=0.80). CMR showed greater NRS effect sizes (core: d=1.48; upper limb: d=0.69; lower limb: d=1.25) than exercise (core: d=0.31; upper limb: d=0.74; lower limb: d=0.83). Benefits persisted at follow-up for both AIS and NRS, especially in the CMR group. Highest neuropathic pain intensity decreased in both groups post-intervention (CMR: d=-0.61; exercise: d=-0.73) and at 6-month follow-up (CMR: d=-0.55; exercise: d=-0.55). Unlike previous studies, group effects for CMR were not found due to high heterogeneity. Increased task-based activation, including in the lateral occipital cortex involved in visual body perception and spatial awareness, was seen for the exercise group (n=5). Discussion: These preliminary results support the potential of remotely delivered CMR and exercise to improve function and reduce neuropathic pain in adults with SCI, highlighting the need for larger trials. Clinicaltrial.gov: NCT05870189

4
PhysiCase: Development and dual-layer validation of synthetic cases for health professional education: A pilot study leveraging Generative AI

Komolafe, O. O.; Roberts, A. C.; Shelley, J.; Tawiah, A. K.

2026-06-09 rehabilitation medicine and physical therapy 10.64898/2026.06.07.26355114 medRxiv
Top 0.6%
1.4%
Show abstract

High-quality, domain-specific datasets are foundational to advancing educational tools and AI systems in healthcare, yet assembling case repositories from real-world clinical records faces substantial privacy, ethical, and licensing barriers. Synthetic data generation offers a compelling pathway forward, but educational cases require rigorous validation to ensure clinical plausibility and pedagogical utility. This pilot study introduces PhysiCase, a dual-layer validation pipeline for synthetic case generation and evaluates the feasibility of combining automated LLM-based screening with expert educator review. We generated 128 synthetic musculoskeletal(MSK) cases using four frontier large language models (GPT-4.1, GPT-4o, Google Gemini 2.5 Pro, and Llama 4 Scout) across 28 clinical conditions. Cases underwent automated quality screening using an "LLM-as-judge" framework (DeepEval) assessing prompt alignment, JSON correctness, answer relevance, bias, toxicity, and completeness. Ninety cases (70.3%) passed automated filtering and proceeded to expert evaluation by four MSK physiotherapy educators, who rated medical accuracy, realism, fidelity, relevance, and usability on 5-point Likert scales. GPT-4.1 demonstrated the highest automated pass rate (96\%) and strongest expert ratings (medical accuracy 4.10/5, usability 4.38/5), while Llama 4 Scout showed the lowest pass rate (33.3%) and expert ratings. Expert-evaluated cases achieved strong content validity indices for usability (97.5%), relevance (97.5%), and realism (95%), though medical accuracy showed greater variance (CVI 87.5%). Cross-layer correlation analysis revealed that automated completeness metrics moderately aligned with expert usability ratings , while answer relevance and prompt alignment showed weak or negative correlations with clinical correctness. Qualitative analysis identified three primary failure modes: reductive logic, biomechanical inconsistency, and administrative/contextual gaps. The dual-layer validation framework proved methodologically viable: automated screening efficiently reduced expert review burden, while human judgment remained indispensable for detecting subtle clinical reasoning failures. LLM-generated synthetic cases has the potential to meet practical educational needs for MSK physiotherapy, but expert validation is essential to safeguard clinical accuracy. These findings support a scalable division of labour for synthetic case development, with targeted improvements to prompting and automated reasoning checks needed to address identified "nuance gaps." The code for this paper is available on https://github.com/kwid-ai/PhysiCase

5
Surviving Severe Acute Brain injury: Care trajectories and missed opportunities

Bunker, A. L.; Engelberg, R. A.; Holloway, R. G.; Creutzfeldt, C. J.

2026-06-09 neurology 10.64898/2026.06.01.26354480 medRxiv
Top 0.6%
1.3%
Show abstract

INTRODUCTION Severe acute brain injury (stroke, traumatic brain injury or hypoxic-ischemic encephalopathy; SABI) is increasingly recognized as a chronic condition with care and communication needs beyond the initial hospitalization. This study aimed to characterize post-acute care patterns among SABI survivors, focusing on healthcare utilization and outpatient communication. METHODS Data were collected from a prospective cohort of hospitalized SABI patients using surveys, chart reviews, and the ED Information Exchange database. Socioeconomic disadvantage was assessed using the Area Deprivation Index (ADI), and qualitative analysis of outpatient notes examined conversations around palliative care needs and goals-of-care. RESULTS Two-thirds of patients (140/222) survived until discharge, primarily to nursing facilities (39%) or inpatient rehabilitation (38%). Among 109 with one-year follow-up, there were 89 hospitalizations, 104 ED visits, and 28 deaths. Patients from the most disadvantaged neighborhoods had significantly higher odds of rehospitalization or ED use within 30 days (OR 3.37, p=0.036). ADI was not linked to one-year utilization. seen outpatient by primary care (40%), neurology/neurosurgery (57%), and palliative care (1%), but conversations rarely revisited prognosis or goals-of-care. CONCLUSIONS Our findings highlight the need for improved long-term care planning and communication, particularly for socioeconomically disadvantaged survivors of SABI.

6
Cortical activity during narrative discourse production in individuals with post-stroke aphasia and controls measured via functional near-infrared spectroscopy

Braun, E. J.; Carpenter, E. A.; Gao, Y.; Yucel, M. A.; Boas, D. A.; Kiran, S.

2026-06-10 rehabilitation medicine and physical therapy 10.64898/2026.06.05.26354921 medRxiv
Top 0.6%
1.3%
Show abstract

Introduction: Aphasia is an acquired language disorder with a significant negative functional impact. Much of the research on aphasia has focused on word-level language comprehension and production. Further evaluation of discourse-level tasks, both at behavioral and neural levels, will allow for an ecologically valid understanding of the functional implications of language impairment in this population. Method: This study evaluated bilateral frontal, temporal, and parietal cortical activity during computer-based narrative production in 14 young neurotypical individuals, 17 individuals with post-stroke aphasia, and 15 age-matched neurotypical participants using functional near-infrared spectroscopy (fNIRS). Oxygenated hemoglobin (HbO) was measured during narrative production following short video clips and compared to HbO during counting aloud. In addition, behavioral measures quantifying in-task performance were correlated with averaged HbO values. Results: Young neurotypical individuals showed greater cortical activity in bilateral language regions for narrative production compared to counting aloud. In contrast, people with aphasia showed positive condition-related effects in the right frontal ROI and the age-matched group showed positive condition-related effects in the left frontal and right precentral ROIs. Each group showed different patterns in relationships between cortical activity and discourse performance measures. Conclusion: Overall, young participants showing more consistent condition-related effects for narrative discourse production than individuals with aphasia and age-matched controls. This study shows the potential for fNIRS to evaluate cortical activity for ecologically valid language tasks in individuals with post-stroke aphasia.

7
Assessment of safe wheeled walker use in frail older adults: Development of a video-based rating instrument

Leonhardt, R.; Lindemann, U.; Schneider, M.; Rapp, K.; Klenk, J.

2026-06-08 geriatric medicine 10.64898/2026.06.04.26354904 medRxiv
Top 0.7%
1.3%
Show abstract

Background: Wheeled walkers can improve safety during walking, but improper use may increase fall risk among frail older adults. No suitable tool exists to assess safe indoor wheeled walker use in this population. This study aimed to develop and validate a video-based expert assessment tool. Methods: Based on the literature and expert consensus, seven problematic indoor situations were identified, and an assessment tool with five safety criteria per situation was developed (maximum score = 35). Fifty participants (mean age 83.9 years, 64% women) from a geriatric rehabilitation clinic and a nursing home were video-recorded while using a rollator. Expert ratings were compared with nursing staff ratings, self-ratings, and the Timed Up and Go test to evaluate validity. Intra- and inter-rater reliability were determined from independent ratings by two physiotherapists and a repeated expert rating after seven days. Sensitivity to change was assessed after two weeks of rehabilitation, and feasibility by the time required for assessment. Results: The expert score of rater 1 at baseline was 28.5 points, and assessment required a mean of 17.5 minutes. Intra-rater reliability was excellent (ICC = 0.98) and inter-rater reliability was good (ICC = 0.80). Validity analyses showed the strongest association with nursing staff assessments (r = 0.74) and a moderate association with the Timed Up and Go test (r = -0.45). After two weeks, patients improved by an average of 2.38 points (8.4% of baseline score). Conclusions: The new instrument demonstrated high reliability, acceptable validity, sensitivity to change, and good feasibility for assessing safe wheeled walker use in frail older adults. Trial registration number and date of registration: DRKS00038358, 07/11/2025

8
Quality and Safety profiles of AI-Generated vs Clinician-Generated Handoffs in Hospital Medicine

Shah, K. P.; Airan Javia, S.; Savage, T.; Bressman, E.

2026-06-08 health informatics 10.64898/2026.06.05.26354946 medRxiv
Top 0.7%
1.2%
Show abstract

End-of-rotation handoffs are critical for patient safety but add to documentation burden for hospitalists. Generative artificial intelligence (AI) may help automate handoff creation using electronic health record data, but its impact on quality and safety is unclear. Methods: We developed an AI handoff tool with a large language model using clinical notes as input and conducted a retrospective evaluation comparing AI-generated and clinician-authored handoffs. Handoffs were assessed across domains of quality and safety through a structured review. Results: Quality ratings were similar between AI and human handoffs (3.7 vs. 3.5, p=0.57). AI-generated handoffs were rated higher for organization (4.4 vs. 4.1, p=0.05) and completeness (4.1 vs. 3.6, p=0.01), but lower for conciseness (3.7 vs. 4.1, p=0.03) and accuracy (4.1 vs. 4.4, p=0.03). Error rates were comparable (0.3/handoff in both groups); however, AI-generated handoffs included inaccuracies (9% of AI errors) and hallucinations (1% of AI errors), while clinician-authored handoffs contained only omissions. Conclusion: Human and AI handoffs have differing error profiles and tradeoffs between completeness and conciseness. Prospective evaluation in clinical workflows is underway.

9
An AI-assisted feasibility evaluation of three photoplethysmography-derived microvascular reactivity signals in MIMIC-IV-WDB v0.1.0

Landry, T. C.; Kim, Y.

2026-06-06 health informatics 10.64898/2026.06.03.26354863 medRxiv
Top 0.8%
0.9%
Show abstract

Background. Capillary refill time, an examiner-dependent bedside test of distal microvascular perfusion, has become a resuscitation target in septic shock,1,2,3,4 motivating a continuous surrogate computed from the photoplethysmogram (PPG, the optical waveform the pulse oximeter on every ICU patient already records).5,6,7,8 Objective. We attempted three PPG-derived candidate measures on the MIMIC-IV Waveform Database (MIMIC-IV-WDB v0.1.0) and asked, by inspecting randomly drawn examples, whether each captured its intended physiology before any downstream modeling. Methods. MIMIC-IV-WDB v0.1.09 was linked to MIMIC-IV.10 The signals were a cuff-anchored perfusion-index recovery (reactive hyperemia when the cuff shares an arm with the probe), a slow Mayer-wave-band power ratio of the perfusion index (sympathetic vasomotor tone), and a per-beat diastolic exponential decay time constant (a refill-like recovery time). For each signal we drew 10 random examples at a fixed seed and checked them against a checklist fixed in advance. Each was read by the author and, separately, by MedGemma 1.5, a multimodal medical language model run locally. A synthetic test with a known time constant checked the third signal. Results. The cuff-anchored signal showed the expected occlusion-reperfusion shape on 268 of 6,236 evaluable cuff cycles (4.30%) in 15 of 19 patients, consistent with opposite-limb placement of the probe and cuff. The slow-band ratio returned a stable cohort value, but a clear, stationary peak appeared in only4 of 10 random windows. The per-beat fit met its goodness-of-fit threshold in 10 of 10 beats, yet a cardiac-frequency heuristic flagged a possible fit on the heart-rate oscillation in 7 of 10, and in 5 of 17 patients the time constant lay where an exponential is indistinguishable from a straight line. A 0.5Hz high-pass pre-filter implanted its own approximately 318 ms time constant regardless of truth. The language model tracked the human on clear positives but reported the pattern present on every call it returned, never absent. Conclusions. Two of the three candidate signals did not reflect their intended physiology in most examples, and the third was constrained by sensor placement. Inspecting a few random raw inputs against a checklist written in advance is an inexpensive upstream check before downstream inference on PPG-derived microvascular signals.

10
Effect of levodopa treatment on gait in older adults with mild parkinsonian signs

Pongmala, C.; Roytman, S.; van Emde Boas, M.; Vangel, R.; Rosano, C.; Bohnen, N.

2026-06-06 geriatric medicine 10.64898/2026.06.04.26354926 medRxiv
Top 0.8%
0.9%
Show abstract

Background Slow walking in older adults with mild parkinsonian signs (MPS) is a complex, multifactorial phenomenon arising from the cumulative burden of subclinical age-associated pathologies. This decline reflects age-associated neuronal loss in the dopaminergic system. A recent study suggests that levodopa treatment may enhance gait parameters. The goal of this small pilot study is to explore the effect of levodopa treatment on slow walking gait in older adults with MPS. Method This study was a randomized, placebo-controlled clinical pilot trial. Slow walking older adults without clinical evidence of PD were recruited and randomized into 2 groups (active treatment group or placebo control group). Participants in the active group were pre-treated with carbidopa for three days, followed by carbidopa-levodopa for seven days. Spatiotemporal gait parameters were evaluated at baseline and post-intervention. Results Gait factor analysis identified three main factors explaining gait characteristics at baseline, which included gait efficiency, gait rhythmicity, and gait turning.No effect of treatment was observed in the placebo group (p=0.111, p=0.616), no group difference was observed between the placebo and active group at baseline ({beta}=0.310, p=0.547), but a strong trend for a treatment-related increase was observed in the active treatment group ({beta}=0.506, p=0.076). Conclusion Our preliminary data suggest that sustained levodopa treatment (one week) in conjunction with carbidopa pre-treatment and concomitant carbidopa supplementation is feasible in slow walking older adults with MPS. Moreover, the data indicate potential efficacy, showing improvements in cadence, and step durations.

11
Computer Vision Scoring of Figure Copy and Recall

Woods, D. L.; Hall, K.; Jaramillo, I.; Blank, M.; Geraci, K.; Boghassian, A.; Pebler, P.

2026-06-11 neurology 10.64898/2026.06.10.26355298 medRxiv
Top 0.8%
0.8%
Show abstract

Objective. Figure copy and recall tests are sensitive measures of visuoconstruction and visual episodic memory, but their clinical is constrained by labor-intensive manual scoring. We developed and validated an automated, element-level scoring pipeline using Vertex AI object detection for the tablet-based figure copy and recall tasks in the California Cognitive Assessment Battery (CCAB). The automated scoring pipeline duplicated the scoring procedures used by expert manual raters. Methods. A normative sample of 2,011 community-dwelling adults aged 18-90 completed figure copy and delayed recall trials at baseline, with subsamples retested at 1 day and at 6, 18, and 30 months. Participants completed the drawings with their index finger on a tablet computer with finger position digitized to analyze the speed and timing of individual drawing strokes A convolutional object-detection model trained on the Vertex AI AutoML Vision platform identified each of twelve canonical figure elements in rendered drawings. Separate element presence and location scores were computed after homographically warping drawings onto a canonical template to produce trial-level Element, Location, and Total scores. To compare Vertex and human scores, Vertex AI and expert human raters independently scored 1500 randomly selected drawings to evaluate inter-rater agreement, including a common subset of 100 drawings scored by Vertex AI and all raters. Results. Total scores were virtually indistinguishable (r = 0.966) from human-human agreement (mean r = 0.971) as were Element presence scores (mean r = 0.959 vs. r = 0.963). Location-score agreement (r = 0.951) was slightly below the human-human mean (r = 0.972) due to pixel-level analysis by Vertex AI that was impossible for human raters. The Vertex pipeline showed no preferential advantage for the single expert rater who categorized Elements during training. Automated scores showed strong demographic gradients, age effects on Recall (r = -0.32) were approximately twice those in Copy conditions (r = -0.16). A Memory Cost score (Recall - Copy) showed a monotonic age-related decline from +0.40 z in the youngest subjects to -0.54 z in the oldest. Kinetic analysis revealed that drawing speed and efficiency showed significant age-related changes. Overnight test-retest reliability was high (Recall r = 0.72) and the Recall trial showed a large overnight learning effect ({Delta} = +1.18) that continued with repeated tests up to 30 months ({Delta} = +0.75).

12
Room-Specialized Mixture-of-Experts for In-Home ADL Recognition with Ambient Sensors

Addepalli, V. r.; Rao, P.; Kiselica, A.; Kummerfeld, E.; Abdalnabi, N.; Lee, K.

2026-06-12 health informatics 10.64898/2026.06.10.26355390 medRxiv
Top 2%
0.3%
Show abstract

Monitoring activities of daily living (ADLs) in the home is a promising approach for tracking dementia progression in older adults. While ambient sensor-based ADL systems are well-studied, most existing ADL recognition systems rely on globally trained models that ignore the spatial organization of in-home activities. In real deployments, where training data are sparse and highly home-specific, global transformer models may fail to capture room-dependent behavioral structure. We propose a deterministic Mixture of Experts (MoE) architecture for in-home ADL recognition, in which each expert is a compact transformer specialized to one room of the home (bedroom, kitchen, bathroom, living area). Input segments are routed using a deterministic gating strategy based on room-level motion activity and time-of-day priors for sleep-related behaviors. Unlike learned routing networks, the proposed gate encodes domain knowledge about where ADLs are likely to occur, reducing model complexity under limited per-home training data. By decomposing ADL recognition into room-specific activity spaces, the proposed architecture reduces competition between dominant and low-frequency activities under highly imbalanced residential data. We evaluated the system on data collected via low-cost ambient sensors (motion, light, temperature, humidity) and Raspberry Pi edge devices across five homes, with ground-truth ADL labels provided by participants and caregivers. Across the five homes, the proposed MoE consistently outperformed global transformer, 1D CNN, and Random Forest baselines, achieving macro-F1 scores ranging from 0.60 to 0.88, highlighting the importance of home-specific modeling in real-world deployments. These findings suggest that room-aware expert specialization may provide a practical and interpretable strategy for low-data ADL recognition in real-world residential environments.

13
Validity and Limitations of the Empatica E4 Wristband for Autonomic and Thermoregulatory Sleep Monitoring Against Concurrent Polysomnography: A Wearanize+ Dataset Study

Parry, Y. D.; Briganti, G.

2026-06-11 health informatics 10.64898/2026.06.10.26355348 medRxiv
Top 2%
0.3%
Show abstract

The Empatica E4 wristband provides continuous multi-modal physiological monitoring including blood volume pulse (BVP), electrodermal activity (EDA) and skin temperature (TEMP) but its validity for sleep-stage-specific autonomic and thermoregulatory monitoring has not been systematically evaluated against concurrent polysomnography (PSG). Using the Wearanize+ dataset which provides synchronised PSG, Empatica E4, and Zmax EEG recordings from 100 home-recorded participants; a systematic validation of Empatica E4 physiological signals against PSG ground truth across five sleep stages was conducted. Of 100 participants, 92 had Empatica data; 69 met Zmax EEG signal quality criteria and formed the analysis sample. Heart rate (HR) from the pre-computed Empatica HR channel showed valid stage-specific patterns (Wake: 70.9 bpm, N3: 61.2 bpm) and moderate inter-device MeanNN correspondence with PSG ECG (Spearman r=0.35-0.42 across stages). Skin temperature showed the expected thermoregulatory pattern (Wake: 33.92C, N3: 35.48C) and is recommended for downstream analyses. Tonic EDA showed an inverted stage pattern attributable to wrist sweat accumulation during deep sleep, representing a known confound for wrist-worn EDA during sleep. Phasic EDA showed plausible patterns and may be used with caution. These findings establish a validated feature set for Empatica E4 sleep research and directly inform multimodal psychiatric biomarker studies using the Wearanize+ dataset.

14
A Data-Driven Framework for Generating Population-Linked Case Vignettes from Nationwide Triage Data

Seidel, A.; Steiger, E.; Schuster, J.; Kroll, L. E.

2026-06-10 health informatics 10.64898/2026.06.08.26354886 medRxiv
Top 2%
0.3%
Show abstract

Background: Digital decision-support tools such as triage systems and symptom checkers support millions of health-related decisions each year. Their quality and safety are commonly evaluated using textual patient cases, known as case vignettes. However, existing vignette sets written by medical experts cover only a limited spectrum of real-world patient presentations and lack population weights, which would allow extrapolating evaluation results to the underlying patient population. Objective: This study aims to develop a data-driven framework for automatically generating a human-manageable set of case vignettes from nationwide triage data that captures broad presentation diversity and links each vignette to a quantitative weight reflecting the number of underlying patient assessments. Methods: From 3.2 million triage assessments conducted over one year using structured triage software in the German medical on-call service (telephone triage and online self-triage) and at the joint contact points of the outpatient emergency care service and hospital emergency departments, we randomly sampled 50,000 cases. Triage questionnaires were converted into semantic embeddings using a German Sentence Transformer Model and grouped by agglomerative clustering. For clusters containing sufficient assessments, we generated one representative assessment using a two-phase simulated-annealing optimization. The optimization minimized the distance to the cluster centroid while maximizing the number of answered triage questions, aiming for high representativeness and information content. Each representative assessment was assigned the size of its source cluster as its sample-based weight. A similarity-based sensitivity analysis was performed to examine whether these weights were preserved in the full 1-year population. Finally, the question-answer pairs of the representative assessments were converted into structured textual case vignettes using controlled prompting of a large language model. Results: The cluster analysis yielded 514 included clusters covering 96.8% of the sampled 50,000 assessments. The generated representatives showed strong agreement with the majority treatment-urgency recommendation of their source cluster (Spearman's {rho}=0.78, p<0.001) and contained on average 4.3 more answered triage questions than the original assessments within their clusters. When weighted by cluster size, the representatives approximated the sample distributions of treatment urgency, demographics, and symptoms, although some systematic deviations remained, most notably an overrepresentation of female cases (+13.5%), patients aged 14-49 years (+8.0%), and the urgency category "As soon as possible" (+6.6%). Of 121 recorded symptoms, 101 (83.5%) were covered by the representatives; the rest each occurred in <0.5% of the sample. In a sensitivity analysis, cluster-based vignette weights were strongly correlated with similarity-based population weights (Spearman's {rho}=0.77, p<0.001), and 90.1% of assessments in the full 1-year population were matched to at least one vignette. Conclusions: We present a data-driven framework for deriving a manageable set of population-weighted case vignettes from nationwide triage data. The resulting vignettes captured broad presentation diversity, approximated key sample characteristics, and provided an explicit quantitative link to the number of underlying patient assessments. After medical expert review and refinement, the vignettes may support more population-aware evaluation and quality assurance of digital decision-support tools.

15
AutoClip: AI-Guided TEE Semantic Segmentation for TEER A Proof-of-Concept Study

Chen, M.; Li, X.; Yang, K.; Taramasso, M.

2026-06-06 cardiovascular medicine 10.64898/2026.05.29.26354195 medRxiv
Top 2%
0.3%
Show abstract

**Abstract** **Background:** Transcatheter edge-to-edge repair (TEER) is an established treatment for mitral regurgitation but remains highly dependent on operator experience and complex transesophageal echocardiography (TEE)-guided intraprocedural imaging. Artificial intelligence (AI)-based semantic segmentation may improve procedural reproducibility and intraprocedural guidance; however, no TEER-specific segmentation framework has been reported. **Objectives:** To develop and evaluate AutoClip, a clinician-driven AI-guided TEE semantic segmentation model designed for simultaneous delineation of mitral valve anatomy and in-vivo TEER device components. **Methods:** A retrospective proof-of-concept study was conducted using 987 intraprocedural TEE frames derived from 10 video clips in 3 patients undergoing MitraClip G4 implantation. Seven semantic labels, including mitral leaflets and device components, were manually annotated using ITK-SNAP. Following standardized preprocessing and region-of-interest extraction, an Attention U-Net architecture was trained frame-wise on bicommissural and corresponding X-plane TEE views. Model performance was assessed using mean intersection-over-union (IoU) and Dice coefficient on an independent test set. **Results:** The Attention U-Net demonstrated improved sensitivity to small device structures compared with conventional U-Net architectures. Preliminary training performance achieved a mean IoU of approximately 0.93, while independent test performance reached a mean IoU of 0.46 across foreground classes. Qualitative assessment demonstrated feasible simultaneous segmentation of mitral leaflets, clip arms, grippers, and delivery shaft during TEER procedures. **Conclusions:** AutoClip represents a proof-of-concept TEER-specific TEE semantic segmentation framework initiated through a clinician-oriented workflow without formal computer science expertise. Although preliminary accuracy remains modest due to limited sample size, this study establishes a reproducible pathway for future AI-assisted intraprocedural guidance systems and larger multicenter development efforts in structural heart interventions.

16
Heart Rate Circadian Oscillations as Digital Biomarkers of Cardiometabolic Health Determinants

Colitta, A.; Bruno, S.; Benedetti, D.; Hoxhaj, D.; Cruz-Sanabria, F.; Di Pede, C.; Buracchi Torresi, F.; Frumento, P.; Gargani, L.; Fabbrini, M.; Maestri Tassoni, M.; Bonanni, E.; Faraguna, U.

2026-06-10 cardiovascular medicine 10.64898/2026.06.07.26355124 medRxiv
Top 2%
0.3%
Show abstract

AIMS Cardiometabolic risk factors may impair health by altering the autonomic modulation of the cardiovascular system, a physiological process described by heart rate (HR) circadian oscillations. However, the impact of cardiometabolic health determinants on HR circadian oscillations remains scarcely characterized in real-world, population-based settings. To address this, we applied digital health technologies to investigate how cardiometabolic health determinants shape HR circadian oscillations in a real-world cohort of individuals free of cardiometabolic diseases. METHODS First, a 10-fold cross-validation of a model was performed, aiming at mitigating wearables measurement error caused by motion artifacts. This process was informed by 10,056 epochs of concurrent wearable-derived and polysomnographic HR assessment, yielding an average 1.3 bpm reduction in wearables measurement error. We subsequently applied this model to over 2 million 1-minute epochs of HR data, derived from 7-day continuous actigraphic recordings of 245 individuals free of cardiometabolic disorders. Functional-on-scalar regression modelling and both parametric and nonparametric analyses characterized HR circadian profiles and their relationships with demographics, lifestyle, chronotype, sleep health, and chronic insomnia diagnosis. A 6-dimension sleep health index was calculated. RESULTS Sex, chronotype, and sleep health predominantly shaped HR circadian oscillations. In detail, females consistently showed higher HR across the 24 hours. Moreover, chronotype was associated to a phase shift in HR circadian profiles, with later timings corresponding to eveningness. Notably, sleep health impacted HR circadian oscillations in a dose-dependent fashion: each additional impaired sleep dimension was associated with a 1.2 bpm HR increase during nighttime, alongside reduced circadian robustness and delayed oscillation timings. Finally, the earlier occurrence of morning HR peaks served as a digital biomarker of insomnia (80% specificity, 74% sensitivity). CONCLUSIONS This work provides a digital health framework to characterize HR circadian oscillations in free-living populations and supports its clinical utility in capturing the autonomic disruptions related to cardiometabolic health determinants.

17
From Charting Burden to Workflow Signal: Retrospective Validation of Documentation-Density Measures for ICU Complexity and Long-Stay Risk

Collier, A.

2026-06-06 health informatics 10.64898/2026.06.04.26354922 medRxiv
Top 2%
0.2%
Show abstract

Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.

18
Does ECG-Based AI Detect Aortic Stenosis Beyond Conventional LVH Criteria? An Analysis of the CLIDAS Database

Shimada, T.; Kodera, S.; Sawano, S.; Guan, J.; Saitoh, W.; Wakasa, S.; Ito, S.; Yanagishita, T.; Hayashi, Y.; Shibata, A.; Ito, A.; Otsuka, K.; Higashikuni, Y.; Okamura, H.; Tsujita, K.; Node, K.; Yamaguchi, O.; Makimoto, H.; Kabutoya, T.; Imai, Y.; Nakayama, M.; Sato, H.; Fujita, H.; Kohro, T.; Matoba, T.; Takeda, N.; Fukuda, D.; Nagai, R.

2026-06-08 cardiovascular medicine 10.64898/2026.06.07.26355087 medRxiv
Top 2%
0.2%
Show abstract

Background: Aortic stenosis (AS) is a progressive valvular disease associated with poor prognosis once symptoms develop, yet routine echocardiographic screening is impractical. While artificial intelligence (AI)-based electrocardiogram (ECG) models have shown promise for AS detection, it remains unclear whether they primarily reflect conventional left ventricular hypertrophy (LVH) voltage criteria or capture additional ECG features. Methods and Results: We developed a deep learning model using 244,816 ECGs from 51,713 patients across six academic institutions in Japan (CLIDAS database). AS labels were derived from inpatient Diagnosis Procedure Combination (DPC) codes. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.849 (95% confidence interval 0.832-0.865) in the independent test cohort, with consistent performance across institutions, sex, and age. At a threshold of 0.1, sensitivity was 79.1%, specificity was 73.9%, and negative predictive value (NPV) was 98.0%. Conventional LVH voltage criteria (Sokolow-Lyon AUC 0.706; Cornell AUC 0.692) showed lower performance, and adding them to the AI model conferred no incremental benefit (AUC 0.849 vs. 0.847). Gradient-weighted class activation mapping (Grad-CAM) revealed predominant attention around QRS complexes in limb leads, beyond regions typically assessed in LVH evaluation. Conclusions: This multicenter AI-ECG model demonstrated strong discrimination for AS and captured ECG features beyond conventional LVH voltage criteria. The high NPV supports its use as a rule-out pre-screening tool.

19
Computer Vision for Real-Time Anatomical Navigation in Neurosurgery: First-in-Human Clinical Evaluation and Iterative Development (IDEAL Stage 1)

Khan, D. Z.; Mao, Z.; Wijekoon, A.; Das, A.; Williams, S. C.; Blandford, A.; Jain, A.; Harris, L.; Borg, A.; Dorward, N. L.; Clarkson, M.; Bano, S.; McCulloch, P.; Stoyanov, D.; Marcus, H.

2026-06-11 surgery 10.64898/2026.06.11.26355205 medRxiv
Top 2%
0.1%
Show abstract

Introduction: Precise anatomical navigation is fundamental to safe endoscopic pituitary surgery, a high-stakes procedure characterised by a challenging learning curve. While traditional navigation systems often rely on workflow-disrupting probes or static preoperative imaging, advancements in computer vision AI (CVAI) now enable dynamic, real-time anatomical segmentation directly from live surgical video1-3. Our group has previously conducted a series of preclinical human-computer interaction studies to refine the system's design, alongside digital and high-fidelity physical simulations demonstrating the benefit of AI assistance in improving overall performance, training, and safety4-8. Building on this foundation, the current study represents a first-in-human application of real-time CVAI assistance in the neurosurgical operating room, serving to assess feasibility and safety, and to iteratively improve the system. Method: Guided by DECIDE-AI and IDEAL frameworks, this single-centre evaluation comprises an initial proof-of-concept phase (n=6) for endoscopic transsphenoidal pituitary surgeries. The AI model utilised a DINOv3-derived vision transformer architecture, deployed via a high-performance edge computing unit to achieve low-latency, real-time inference without reliance on cloud infrastructure2. Given the high-risk nature of the procedure and the early stage of clinical AI integration, the system was initially deployed as an educational adjunct on a secondary monitor, ensuring the primary surgical feed remains uncompromised. Functionality and safety were assessed via structured questionnaire, prospective observation, and blinded retrospective review of the recordings of the endoscopic surgical video feed and wider operating room environment. Continuous multi-stakeholder feedback through validated human factors surveys drove iterative technical refinements between cases. Results: Six patients with pituitary adenomas were enrolled. The CVAI system was successfully deployed in four cases, demonstrating acceptable real-time sella segmentation accuracy. Deployment failed pre-operatively in two cases owing to a single recurring system reboot bug. Iterative refinement between cases were driven by our experience and surgical team feedback. This resulted in the integration of additional anatomical structure segmentations (e.g., carotid arteries), enhanced model accuracy via training dataset expansion, and hardware firmware upgrades. Multi-stakeholder surveys demonstrated satisfactory system feasibility, usability, and acceptability among the surgical team. Both prospective observation and retrospective video review confirmed the absence of adverse events, including no significant distraction to the primary surgeon, and there were no AI-related clinical complications. Conclusion: This first-in-human early clinical evaluation demonstrates the feasibility, safety and iterative development of real-time, CVAI-based anatomical navigation during high-stakes neurosurgery. Future work will include a larger single-centre case series (IDEAL Stage 2a) with more surgical teams to further iterate the system and explore its impact on training and workflow. As the underpinning technology improves, deployment will transition to direct intra-operative decision support and integration with other intra-operative navigational technologies.

20
A Heterogeneous Graph Neural Network Framework for Multi-Horizon Stroke Mortality Prediction

Tharzeen, A.; Vafaei Sadr, A.; Radfar, N.; Hwang, W.; Abedi, V.; Zand, R.

2026-06-10 health informatics 10.64898/2026.06.09.26355176 medRxiv
Top 3%
0.1%
Show abstract

Background: Machine learning models for stroke mortality prediction typically treat each time horizon independently and use flat tabular features that ignore the relational structure of electronic health records (EHRs). In this pilot study, we leveraged graph-based machine learning models to predict post stroke all-cause-mortality across three different time horizons. Methods: We developed Stroke Temporal Heterogeneous Graph (StrokeTHG), a heterogeneous graph neural network model for simultaneous multi-horizon stroke mortality prediction (30-day, 90-day, 1-year) using EHR data from Penn State Health System. The model encodes various relations among EHR entities (e.g., patient, diagnosis, comorbidity) and temporal encoding of admission time to better predict stroke mortality. We compared our proposed approach against various baseline methods, including Logistic Regression, Random Forest, and XGBoost. We also performed ablation and subgroup analyses, evaluated the quality of learned graph embeddings, and assessed the importance of different edge types in the graph. Results: We included 4,144 stroke patients (mean age 69.2 years; 54.3% men), of whom 3,332 (80.4%) survived their stroke after one year. 30-day, 90-day, and 1-year mortality rates were 9.7%, 13.7%, and 19.6%, respectively. Our proposed approach, StrokeTHG, achieved AUROC of 0.872, 0.878, and 0.837 across horizons, outperforming all tabular baselines. At [&ge;] , 75% specificity, the model identified 5-10 percentage points more mortality cases than the best baseline at each horizon. Subgroup analysis demonstrated consistent performance across sex subgroups and the largest discriminative gains in the Age 65-80 stratum. Edge-type ablation identified phenotype-patient and admission-patient edges in the constructed EHR graph as the most influential relational edges for mortality prediction. StrokeTHG embeddings outperformed all graph and matrix factorization baselines under an identical downstream classifier, confirming that performance gains stem from representation quality rather than classifier capacity. Conclusions: StrokeTHG demonstrates that heterogeneous graph representations of EHR data provide a consistent improvement over flat tabular models for multi-horizon stroke mortality prediction, with particular advantage at clinically actionable sensitivity thresholds and novel multi-horizon monotonic prediction capability. This methodological framework may be adaptable to other EHR-based clinical research studies seeking to leverage heterogeneous relational structures for predictive modeling.